This is a practice of RNA-seq pathway analysis using a small portion (6 out of 156) of GEO GSE135378 dataset. Differential expression analysis was done with DESeq2. Pathway analyses were performed for GO ontology and MSigDb gene sets.
Related paper: Pappalardi MB, et al. https://doi.org/10.1038/s43018-021-00249-x
The paper describes the discovery and systematic characterization of a novel small molecule DNA methyltransferase (DNMT) 1 inhibitor GSK3685032 and its anti-tumor activity in acute myeloid leukemia cell lines. Decitabine (DAC), a nucleoside analog and classic non-specific DNMT inhibitor, was used for comparison of the effects on DNA demethylation, transcription changes, DNA damage, and tumor inhibitory activity but not on specific gene expression.
In this exercise, we used NCBI-processed raw count. From NCBI:
Briefly, SRA runs where the organism is Homo sapiens and type is Transcriptomic are aligned to genome assembly GCA_000001405.15 using HISAT2. Runs that pass a 50% alignment rate are further processed with Subread featureCounts which outputs a raw count file for each run. For Human data, the Homo sapiens Annotation Release 109.20190905 was used for gene annotation. GEO further processes these SRR raw count files into GEO Series raw counts matrices. … In cases where there is more than one SRA run per GEO Sample, the raw counts are summed.
The dataset used here was from leukemia-derived macrophage cell line MV4-11 cells treated with 400 nM decitabine (DAC) or GSK3685032 (GSK in short hereafter) for 4 days together with vehicle (DMSO) control. In original data, each treatment has two set of replicates, each of which includes six samples (six SRRs). The NCBI raw count matrix has six count columns for these samples, two for each condition (DMSO, GSK, and DAC) with corresponding GSM numbers as column names.
The results were consistent with broad effects of DNA demethylation. However, this is a technical practice not aiming to reproduce the original analyses or draw any biological insights.
library(GEOquery)
library(AnnotationDbi)
library(org.Hs.eg.db)
library(DESeq2)
library(apeglm)
library(EDASeq)
library(limma)
library(clusterProfiler)
library(fgsea)
library(msigdbr)
library(ReactomePA)
library(vsn)
library(hexbin)
library(pheatmap)
library(dplyr)
library(stringr)
library(ggplot2)
library(RColorBrewer)
library(Glimma)
library(ggvenn)
library(plotly)
First, we download the NCBI raw count data and the parent GEO files for sample information.
Note:
1. GeneID (ENTREZID) are numbers;
2. Sample names are GSM*([0-9]) from GEO, not the actual sample
names.
Also download MSigDB gene set set for pathway analysis. (Have to download manually from the website using an account).
# setwd("/N/slate/wenschen/pappa-paper")
# NCBI raw count data
url = "https://www.ncbi.nlm.nih.gov/geo/download/?type=rnaseq_counts&acc=GSE135378&format=file&file=GSE135378_raw_counts_GRCh38.p13_NCBI.tsv.gz"
download.file(url, destfile = "GSE135378_raw_counts_GRCh38.p13_NCBI.tsv.gz")
gunzip("GSE135378_raw_counts_GRCh38.p13_NCBI.tsv.gz")
# MSigDB gene sets (https://www.gsea-msigdb.org/gsea/msigdb)
# Curated set
# url = "https://www.gsea-msigdb.org/gsea/msigdb/download_file.jsp?filePath=/msigdb/release/2024.1.Hs/c2.all.v2024.1.Hs.entrez.gmt"
# # Hallmark set
# url = "https://www.gsea-msigdb.org/gsea/msigdb/download_file.jsp?filePath=/msigdb/release/2024.1.Hs/h.all.v2024.1.Hs.entrez.gmt"
# Reactome CP
#url = "https://www.gsea-msigdb.org/gsea/msigdb/download_file.jsp?filePath=/msigdb/release/2024.1.Hs/c2.cp.wikipathways.v2024.1.Hs.entrez.gmt"
raw = read.table("GSE135378_raw_counts_GRCh38.p13_NCBI.tsv", header = TRUE)
print(dim(raw)) # 39376 157
## [1] 39376 157
print(head(raw)[, 1:10])
## GeneID GSM4007051 GSM4007052 GSM4007053 GSM4007054 GSM4007055 GSM4007056
## 1 100287102 32 26 24 24 43 38
## 2 653635 1240 1252 1183 1160 1500 1317
## 3 102466751 81 74 64 58 83 74
## 4 107985730 11 7 8 7 12 13
## 5 100302278 0 2 2 0 1 1
## 6 645520 1 0 0 0 0 0
## GSM4007057 GSM4007058 GSM4007059
## 1 24 21 51
## 2 1051 997 1546
## 3 52 53 84
## 4 7 4 5
## 5 0 0 0
## 6 0 0 0
We need to download the parent file for metadata.
data_matrix = getGEO(GEO = "GSE135378", destdir = ".", GSEMatrix = TRUE, getGPL = FALSE)
## Found 1 file(s)
## GSE135378_series_matrix.txt.gz
Prepare a sample info file for the count table, and a easy-to-read sample table for reference (can also be used for selecting data).
## file for count columns
sample_info = pData(phenoData(data_matrix[[1]]))[, c(1, 1)][1]
sample_info = data.frame(sample_info)
colnames(sample_info) = "details"
sample_info$details = sapply(sample_info, function(x) sub(" \\[.*", "", x))
write.table(sample_info, "sample_info.txt",
col.names = FALSE, quote = FALSE, sep = "\t")
## table
# Remove the "." in "3.2nM" for next step.
# Name the columns accordingly.
sample_info$details = sapply(sample_info$details,
function(x) sub("3.2", "32", x, fixed = TRUE))
sample_info = cbind(rownames(sample_info),
data.frame(str_split_fixed(sample_info$details, "\\.", 5)))
colnames(sample_info) = c("GSM", "cell_line", "dose", "drug", "day", "replicate")
# Get the "." back for "3.2nM".
sample_info$dose = as.numeric(gsub("nM", "", sample_info$dose))
sample_info$dose = sapply(sample_info$dose, function(x) ifelse(x == "32", "3.2", x))
sample_info$day = as.numeric(gsub("d", "", sample_info$day))
sample_info$replicate = as.numeric(gsub("n", "", sample_info$replicate))
# Rearrange the columns.
sample_info = sample_info[, c("GSM", "cell_line", "drug", "dose", "day", "replicate")]
write.table(sample_info, "sample_info_table.txt",
row.names = FALSE, quote = FALSE, sep = "\t")
print(head(sample_info))
## GSM cell_line drug dose day replicate
## 1 GSM4007051 MV411 DMSO 0 1 1
## 2 GSM4007052 MV411 GSK032 400 1 1
## 3 GSM4007053 MV411 DAC 400 1 1
## 4 GSM4007054 MV411 DMSO 0 2 1
## 5 GSM4007055 MV411 GSK032 400 2 1
## 6 GSM4007056 MV411 DAC 400 2 1
Only MV4-11, day4, DMSO, and 400 nM GSK or DAC treatment.
About the cell line (from ATCC):
MV-4-11 cells are macrophages that were isolated from the blast cells of a 10-year-old male with biphenotypic B-myelomonocytic leukemia and deposited by the Wistar Institute. Use these cells in cancer and immunology research.
sample_df = read.table("sample_info.txt")
colnames(sample_df) = c("GSM", "details")
## Check the corresponding names to make sure they match.
# The first column of data is GeneID.
# Total sample number is 156.
nrow(sample_df) == ncol(raw) - 1 # True
## [1] TRUE
all(sample_df$GSM == colnames(raw)[2:157]) # True
## [1] TRUE
print(sum(sample_df$GSM == colnames(raw)[2:157])) # 156
## [1] 156
## Replace and save.
colnames(raw)[2:157] = sample_df$details
# Take a look.
print(head(raw)[, 1:5])
## GeneID MV411.0nM.DMSO.d1.n1 MV411.400nM.GSK032.d1.n1 MV411.400nM.DAC.d1.n1
## 1 100287102 32 26 24
## 2 653635 1240 1252 1183
## 3 102466751 81 74 64
## 4 107985730 11 7 8
## 5 100302278 0 2 2
## 6 645520 1 0 0
## MV411.0nM.DMSO.d2.n1
## 1 24
## 2 1160
## 3 58
## 4 7
## 5 0
## 6 0
# Save the file.
write.table(raw, "GSE135378_raw_counts_GRCh38.p13_NCBI_renamed.tsv",
row.names = FALSE, quote = FALSE, sep = "/t")
# Make a smaller table with the data of interest for next step.
# Day 4, MV4-11, for the Venn diagram in Fig. 6c from the paper.
cols_mv4d4 = grep("MV411", colnames(raw), value = TRUE)
cols_mv4d4 = grep("d4", cols_mv4d4, value = TRUE)
mv4d4_raw = raw[, c("GeneID", cols_mv4d4)]
print(head(mv4d4_raw)[, 1:5])
## GeneID MV411.0nM.DMSO.d4.n1 MV411.3.2nM.GSK032.d4.n1
## 1 100287102 51 49
## 2 653635 1546 1419
## 3 102466751 84 63
## 4 107985730 5 5
## 5 100302278 0 0
## 6 645520 0 1
## MV411.16nM.GSK032.d4.n1 MV411.80nM.GSK032.d4.n1
## 1 49 48
## 2 1474 1553
## 3 79 99
## 4 10 7
## 5 1 1
## 6 0 0
# Let's make a smaller dataset: 400 nM and control only.
cols_mv4d4_400 = grepl("DMSO", colnames(mv4d4_raw)) | grepl("400nM", colnames(mv4d4_raw))
mv4d4_400_raw = cbind(GeneID = mv4d4_raw[, "GeneID"], mv4d4_raw[, c(cols_mv4d4_400)])
print(dim(mv4d4_400_raw)) # 39376 7
## [1] 39376 7
print(head(mv4d4_400_raw))
## GeneID MV411.0nM.DMSO.d4.n1 MV411.400nM.GSK032.d4.n1 MV411.400nM.DAC.d4.n1
## 1 100287102 51 54 78
## 2 653635 1546 1308 1570
## 3 102466751 84 80 74
## 4 107985730 5 16 18
## 5 100302278 0 3 0
## 6 645520 0 1 1
## MV411.0nM.DMSO.d4.n2 MV411.400nM.GSK032.d4.n2 MV411.400nM.DAC.d4.n2
## 1 30 46 79
## 2 925 856 2107
## 3 46 44 120
## 4 2 8 22
## 5 0 0 1
## 6 1 1 2
# Now we only have 6 samples, let's simplify the columns names.
print(colnames(mv4d4_400_raw))
## [1] "GeneID" "MV411.0nM.DMSO.d4.n1"
## [3] "MV411.400nM.GSK032.d4.n1" "MV411.400nM.DAC.d4.n1"
## [5] "MV411.0nM.DMSO.d4.n2" "MV411.400nM.GSK032.d4.n2"
## [7] "MV411.400nM.DAC.d4.n2"
# [1] "GeneID" "MV411.0nM.DMSO.d4.n1" "MV411.400nM.GSK032.d4.n1" "MV411.400nM.DAC.d4.n1"
# [4] "MV411.0nM.DMSO.d4.n2" "MV411.400nM.GSK032.d4.n2" "MV411.400nM.DAC.d4.n2"
colnames(mv4d4_400_raw) = c("GeneID", "DMSO_1", "GSK_1", "DAC_1", "DMSO_2", "GSK_2", "DAC_2")
mv4d4_400_raw2 = mv4d4_400_raw # Make a copy just in case
# Arrange the columns
mv4d4_400_raw = mv4d4_400_raw[, c("GeneID", "DMSO_1", "DMSO_2", "DAC_1", "DAC_2", "GSK_1", "GSK_2")]
print(colnames(mv4d4_400_raw))
## [1] "GeneID" "DMSO_1" "DMSO_2" "DAC_1" "DAC_2" "GSK_1" "GSK_2"
write.table(mv4d4_400_raw, "MV4-11_raw_counts_NCBI_d4_400nm.tsv",
row.names = FALSE, quote = FALSE, sep = "\t")
Sample names as index (row name);
Condition(treatment) as level for comparison.
sample_table = data.frame(condition = c("DMSO", "DMSO", "DAC", "DAC", "GSK", "GSK"))
rownames(sample_table) = colnames(mv4d4_400_raw[2:7])
sample_table$condition = as.factor(sample_table$condition)
sample_table$batch = as.factor(str_split_i(rownames(sample_table), "_", 2))
print(sample_table)
## condition batch
## DMSO_1 DMSO 1
## DMSO_2 DMSO 2
## DAC_1 DAC 1
## DAC_2 DAC 2
## GSK_1 GSK 1
## GSK_2 GSK 2
First, construct a DESeq dataset from the count matrix, sample table, and set the comparison condition. Include batch in the design formula.
# Construct the count matrix. Same as above, but remove the gene ID column name.
count_matrix = mv4d4_400_raw[, 2:7]
rownames(count_matrix) = mv4d4_400_raw[, 1]
print(head(count_matrix))
## DMSO_1 DMSO_2 DAC_1 DAC_2 GSK_1 GSK_2
## 100287102 51 30 78 79 54 46
## 653635 1546 925 1570 2107 1308 856
## 102466751 84 46 74 120 80 44
## 107985730 5 2 18 22 16 8
## 100302278 0 0 0 1 3 0
## 645520 0 1 1 2 1 1
# Make sure sample_table row names = column (sample) names of the matrix.
all(colnames(count_matrix) == rownames(sample_table)) # True
## [1] TRUE
# Now we can create the DESeq dataset object, adding batch into design.
dds = DESeqDataSetFromMatrix(countData = count_matrix,
colData = sample_table,
design = ~ batch + condition)
print(dds)
## class: DESeqDataSet
## dim: 39376 6
## metadata(1): version
## assays(1): counts
## rownames(39376): 100287102 653635 ... 4576 4571
## rowData names(0):
## colnames(6): DMSO_1 DMSO_2 ... GSK_1 GSK_2
## colData names(2): condition batch
Filter the counts, removing low counts as recommended by the authors of DESeq.
# We have 3 conditions, each with two replicates.
# Note that original authors set a much looser criterion (>= 2 reads) even with more samples.
keep = rowSums(counts(dds) >= 10) >= 2
sum(keep) # 21700
## [1] 21700
# Total rows: 39376, so 39376 - 21700 = 17676 removed.
dds = dds[keep, ]
# Tell DESeq the reference level (control).
# First check the condition.
print(dds$condition)
## [1] DMSO DMSO DAC DAC GSK GSK
## Levels: DAC DMSO GSK
print(dds$batch)
## [1] 1 2 1 2 1 2
## Levels: 1 2
# DMSO DMSO DAC DAC GSK GSK
# Levels: DAC DMSO GSK
# Set the condition.
dds$condition= factor(dds$condition, levels = c("DMSO", "DAC", "GSK"))
print(dds$condition)
## [1] DMSO DMSO DAC DAC GSK GSK
## Levels: DMSO DAC GSK
# Generate the log fold change table.
dds = DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
print(dds)
## class: DESeqDataSet
## dim: 21700 6
## metadata(1): version
## assays(4): counts mu H cooks
## rownames(21700): 100287102 653635 ... 4576 4571
## rowData names(30): baseMean baseVar ... deviance maxCooks
## colnames(6): DMSO_1 DMSO_2 ... GSK_1 GSK_2
## colData names(3): condition batch sizeFactor
Significance level set to 0.05 as that from original authors.
res_GSK = results(dds, contrast = c("condition", "GSK", "DMSO"), alpha = 0.05)
# Sort the rows by fold change.
res_GSK = res_GSK[order(res_GSK$log2FoldChange, decreasing = TRUE), ]
print(summary(res_GSK))
##
## out of 21700 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 4685, 22%
## LFC < 0 (down) : 4783, 22%
## outliers [1] : 0, 0%
## low counts [2] : 0, 0%
## (mean count < 3)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
##
## NULL
res_DAC = results(dds, contrast = c("condition", "DAC", "DMSO"), alpha = 0.05)
# Sort the rows by fold change.
res_DAC = res_DAC[order(res_DAC$log2FoldChange, decreasing = TRUE), ]
print(summary(res_DAC))
##
## out of 21700 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 5065, 23%
## LFC < 0 (down) : 5212, 24%
## outliers [1] : 0, 0%
## low counts [2] : 0, 0%
## (mean count < 3)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
##
## NULL
For visualization and ranking according to DESeq authors.
lfcShrink_GSK = lfcShrink(dds, coef = "condition_GSK_vs_DMSO", type = "apeglm")
lfcShrink_DAC = lfcShrink(dds, coef = "condition_DAC_vs_DMSO", type = "apeglm")
We look at some plots for data quality.
Most of the points (genes) are expected to be around the horizontal 0 line. True.
DESeq2::plotMA(lfcShrink_GSK, ylim = c(-3, 3))
DESeq2::plotMA(lfcShrink_DAC, ylim = c(-3, 3))
#### P-value distribution
Show expected results.
par(mfrow = c(1, 2))
ggplot(res_GSK, aes(x = pvalue)) + geom_histogram(bins = 100) +
ggtitle("Distribution of raw p values") +
theme_classic()
ggplot(res_DAC, aes(x = padj)) + geom_histogram(bins = 100) +
ggtitle("Distribution of raw p values") +
theme_classic()
Relative log expression plots. Normalization worked well.
EDASeq::plotRLE(counts(dds),
outline = FALSE,
col = colData(dds)$condition,
ylim = c(-3, 3),
main = "Raw Counts")
EDASeq::plotRLE(counts(dds, normalized = TRUE),
outline = FALSE,
col = colData(dds)$condition,
ylim = c(-3, 3),
main = "Normalized Counts")
Replicates have variations with a pattern that seems to suggest batch effect, but overall they were clustered together.
Normalized counts.
select = order(rowMeans(counts(dds, normalized = TRUE)), decreasing = TRUE)[1:100]
col_df <- data.frame(colData(dds))["condition"]
pheatmap(assay(normTransform(dds))[select, ], scale = "row",
cluster_rows = FALSE, show_rownames = FALSE,
annotation_col = col_df)
Counts with regularized log transformation(rlog).
# Regularized log transformation(rlog)
rlogT = rlog(dds)
pheatmap(assay(rlogT)[select, ], scale = "row",
cluster_rows = FALSE, show_rownames = FALSE,
annotation_col = col_df)
Counts with variance_stabilizing transformation(vst).
# Regularized log transformation(rlog)
vsd = vst(dds)
pheatmap(assay(vsd)[select, ], scale = "row",
cluster_rows = FALSE, show_rownames = FALSE,
annotation_col = col_df)
Use limma removeBatchBatchEffect to see if batch effect indeed exists.
Hard to tell here, but clearer look at the PCA plot below.
vsd2 = vsd
mat = assay(vsd2)
mmatrix = model.matrix(~ condition, colData(vsd2))
mat = removeBatchEffect(mat, batch = vsd2$batch, design = mmatrix)
assay(vsd2) = mat
pheatmap(assay(vsd2)[select, ], scale = "row",
cluster_rows = FALSE, show_rownames = FALSE,
annotation_col = col_df)
Groups are well separated and replicates are close to each other. Also the variance was well captured by the first two components.
With rlog-transformation.
plotPCA(rlogT, ntop = 500, intgroup = "condition") +
theme_classic()
With vst.
plotPCA(vst(dds), ntop = 500, intgroup = "condition") +
theme_classic()
Use vst-transformed and limma-processed data fro PCA. Now the replicates are much closer for control and GSK.
plotPCA(vsd2, ntop = 500, intgroup = "condition") +
theme_classic()
Replicates are closer.
sampleDists = dist(t(assay(rlogT)))
sampleDistMatrix = as.matrix(sampleDists)
rownames(sampleDistMatrix) = paste(rlogT$condition, rlogT$batch, sep = "-")
colnames(sampleDistMatrix) = NULL
colors <- colorRampPalette(rev(brewer.pal(9, "Blues")))(255)
pheatmap(sampleDistMatrix,
clustering_distance_rows = sampleDists,
clustering_distance_cols = sampleDists,
cols = colors)
Show typical result. Overall the data quality is good.
plotDispEsts(dds)
Add gene symbol and name to the results.
annot_result = function(res) {
res$Symbol = mapIds(org.Hs.eg.db,
keys = row.names(res),
column = "SYMBOL",
keytype = "ENTREZID",
multiVals = "first")
res$GeneName = mapIds(org.Hs.eg.db,
keys = row.names(res),
column = "GENENAME",
keytype = "ENTREZID",
multiVals = "first")
res = res[, c(7, 8, 1:6)]
return(res)
}
res_GSK = annot_result(res_GSK)
res_DAC = annot_result(res_DAC)
About the matching of EntrezID and Symbol:
We would get quite a few NAs (~ 3%) in Symbol column:
print(nrow(res_GSK[is.na(res_DAC$Symbol), ]))
## [1] 723
print(nrow(res_DAC[is.na(res_DAC$Symbol), ]))
## [1] 723
This is because NCBI withdrew or replaced these EntrezIDs. So we can safely remove those entries.
res_GSK = res_GSK[!is.na(res_DAC$Symbol), ]
res_DAC = res_DAC[!is.na(res_DAC$Symbol), ]
print(dim(res_GSK))
## [1] 20977 8
print(dim(res_DAC))
## [1] 20977 8
Top 10 significantly (padj < 0.05) up-regulated genes.
print(res_GSK[res_GSK$padj < 0.05, ][1:10, c("Symbol", "GeneName", "log2FoldChange", "pvalue", "padj")])
## log2 fold change (MLE): condition GSK vs DMSO
## Wald test p-value: condition GSK vs DMSO
## DataFrame with 10 rows and 5 columns
## Symbol GeneName log2FoldChange pvalue
## <character> <character> <numeric> <numeric>
## 1769 DNAH8 dynein axonemal heav.. 12.6256 3.00101e-18
## 168400 DDX53 DEAD-box helicase 53 11.7535 6.38616e-16
## 139135 PASD1 PAS domain containin.. 10.9961 4.76582e-14
## 221400 TDRD6 tudor domain contain.. 10.9548 9.18422e-17
## 4100 MAGEA1 MAGE family member A1 10.8775 9.69441e-14
## 107986400 LOC107986400 uncharacterized LOC1.. 10.7869 1.71088e-13
## 140801 RPL10L ribosomal protein L1.. 10.6207 4.03494e-13
## 503637 DUXAP8 double homeobox A ps.. 10.6106 3.58835e-15
## 644623 TPTE2P2 TPTE2 pseudogene 2 10.6062 4.89920e-13
## 159163 RBMY1F RNA binding motif pr.. 10.5356 6.75567e-13
## padj
## <numeric>
## 1769 6.59129e-17
## 168400 1.16650e-14
## 139135 7.28811e-13
## 221400 1.80851e-15
## 4100 1.43793e-12
## 107986400 2.47178e-12
## 140801 5.62351e-12
## 503637 6.11683e-14
## 644623 6.77583e-12
## 159163 9.17385e-12
print(res_DAC[res_DAC$padj < 0.05, ][1:10, c("Symbol", "GeneName", "log2FoldChange", "pvalue", "padj")])
## log2 fold change (MLE): condition DAC vs DMSO
## Wald test p-value: condition DAC vs DMSO
## DataFrame with 10 rows and 5 columns
## Symbol GeneName log2FoldChange pvalue
## <character> <character> <numeric> <numeric>
## 1769 DNAH8 dynein axonemal heav.. 12.37099 1.38875e-17
## 168400 DDX53 DEAD-box helicase 53 11.08313 2.52659e-14
## 139135 PASD1 PAS domain containin.. 10.56688 4.34106e-13
## 200772 UICLM up-regulated in colo.. 10.15280 4.81644e-12
## 4100 MAGEA1 MAGE family member A1 10.10069 4.76705e-12
## 140801 RPL10L ribosomal protein L1.. 10.02164 7.64920e-12
## 503637 DUXAP8 double homeobox A ps.. 9.97498 1.39451e-13
## 4101 MAGEA2 MAGE family member A2 9.66811 4.62239e-11
## 266740 MAGEA2B MAGE family member A2B 9.64062 5.24897e-11
## 8991 SELENBP1 selenium binding pro.. 9.56324 1.04546e-10
## padj
## <numeric>
## 1769 2.41668e-16
## 168400 3.24037e-13
## 139135 4.89356e-12
## 200772 4.93003e-11
## 4100 4.88178e-11
## 140801 7.69174e-11
## 503637 1.65451e-12
## 4101 4.30313e-10
## 266740 4.84438e-10
## 8991 9.28249e-10
print("Number of genes common to both drugs:")
## [1] "Number of genes common to both drugs:"
print(sum(res_GSK[1:10, ]$Symbol %in% res_DAC[1:10, ]$Symbol))
## [1] 6
Top 10 significantly (padj < 0.05) down-regulated genes.
print(tail(res_GSK[res_GSK$padj < 0.05, ], 10)[, c("Symbol", "GeneName", "log2FoldChange", "pvalue", "padj")])
## log2 fold change (MLE): condition GSK vs DMSO
## Wald test p-value: condition GSK vs DMSO
## DataFrame with 10 rows and 5 columns
## Symbol GeneName log2FoldChange pvalue
## <character> <character> <numeric> <numeric>
## 125704 DIPK1C divergent protein ki.. -4.13694 1.89200e-09
## 2563 GABRD gamma-aminobutyric a.. -4.15627 2.53000e-07
## 28832 IGLJ2 immunoglobulin lambd.. -4.15936 1.95013e-03
## 54768 HYDIN HYDIN axonemal centr.. -4.19980 1.69319e-02
## 28755 TRAC T cell receptor alph.. -4.23417 2.91930e-06
## 100874051 LZTS1-AS1 LZTS1 antisense RNA 1 -4.46116 1.16429e-02
## 105378210 LOC105378210 uncharacterized LOC1.. -4.60916 2.83952e-06
## 9834 FAM30A family with sequence.. -4.62309 1.08903e-02
## 8328 GFI1B growth factor indepe.. -5.11121 2.03254e-03
## 729224 KIAA2012-AS1 KIAA2012 antisense R.. -5.71228 3.58213e-03
## padj
## <numeric>
## 125704 1.67100e-08
## 2563 1.61998e-06
## 28832 5.98894e-03
## 54768 4.01467e-02
## 28755 1.58412e-05
## 100874051 2.89869e-02
## 105378210 1.54237e-05
## 9834 2.74026e-02
## 8328 6.21650e-03
## 729224 1.03230e-02
print(tail(res_DAC[res_DAC$padj < 0.05, ], 10)[, c("Symbol", "GeneName", "log2FoldChange", "pvalue", "padj")])
## log2 fold change (MLE): condition DAC vs DMSO
## Wald test p-value: condition DAC vs DMSO
## DataFrame with 10 rows and 5 columns
## Symbol GeneName log2FoldChange pvalue
## <character> <character> <numeric> <numeric>
## 105371019 LOC105371019 uncharacterized LOC1.. -3.39580 1.90049e-02
## 6319 SCD stearoyl-CoA desatur.. -3.55122 1.44560e-22
## 167127 UGT3A2 UDP glycosyltransfer.. -3.55187 1.46825e-05
## 4843 NOS2 nitric oxide synthas.. -3.62293 1.69136e-13
## 107984343 LOC107984343 uncharacterized LOC1.. -3.64590 1.51609e-07
## 83729 INHBE inhibin subunit beta E -3.81679 5.41816e-06
## 125704 DIPK1C divergent protein ki.. -3.89736 5.33969e-10
## 105378210 LOC105378210 uncharacterized LOC1.. -4.00699 1.72031e-06
## 81501 DCSTAMP dendrocyte expressed.. -4.01942 1.09540e-05
## 90271 OLMALINC oligodendrocyte matu.. -5.02066 5.50651e-33
## padj
## <numeric>
## 105371019 4.12778e-02
## 6319 3.82090e-21
## 167127 6.34554e-05
## 4843 1.98821e-12
## 107984343 8.92543e-07
## 83729 2.50424e-05
## 125704 4.33325e-09
## 105378210 8.64538e-06
## 81501 4.84021e-05
## 90271 2.85182e-31
print("Number of genes common to both drugs:")
## [1] "Number of genes common to both drugs:"
print(sum(tail(res_GSK[res_GSK$padj < 0.05, ], 10)$Symbol %in%
tail(res_DAC[res_DAC$padj < 0.05,], 10)$Symbol))
## [1] 2
Save the files.
write.table(res_GSK, "MV411_day4_GSK032_400nM_NCBI_count_lfc.tsv",
sep = "\t", quote = FALSE)
write.table(res_DAC, "MV411_day4_DAC_400nM_NCBI_count_lfc.tsv",
sep = "\t", quote = FALSE)
Hover the mouse over a point to see the gene symbol, fold change and -log10(padj).
plotVolcano = function(res_DE, condition) {
ggplot(res_DE,
aes(x = log2FoldChange,
y = -log10(padj),
label = Symbol,
color = ifelse(res_DE$padj < 0.05 & abs(res_DE$log2FoldChange) > 1,
"yes", "not"))) +
geom_point() +
labs(title = paste0(condition, " vs control"),
xlab = "log2 fold change",
ylab = "-log10 adjusted p-value",
color = "> 2-fold & padj < 0.05") +
scale_color_manual(values = c("#56B4E9", "#D55E00")) +
theme(plot.title = element_text(size = 12, hjust = 0.5),
axis.title = element_text(size = 12),
legend.position = "right")
}
plotGSK = plotVolcano(res_GSK, "GSK")
ggplotly(plotGSK, tooltip = c("log2FoldChange", "-log10(padj)", "label"))
plotDAC = plotVolcano(res_DAC, "DAC")
ggplotly(plotDAC, tooltip = c("log2FoldChange", "-log10(padj)", "label"))
There were more up-regulated genes than down-regulated genes.
par(mfrow = c(1, 2))
# Select genes with 2-fold changes and sort them decreasingly.
DE_gene_GSK = res_GSK[abs(res_GSK$log2FoldChange) >= 1 & res_GSK$padj < 0.05, ]
DE_gene_GSK = DE_gene_GSK[, c("Symbol", "log2FoldChange", "padj")]
DE_gene_GSK = DE_gene_GSK[!is.na(DE_gene_GSK$Symbol), ]
DE_gene_GSK = DE_gene_GSK[order(-DE_gene_GSK$log2FoldChange), ]
print(nrow(DE_gene_GSK)) # 3435 genes
## [1] 3144
DE_gene_DAC = res_DAC[abs(res_DAC$log2FoldChange) >= 1 & res_DAC$padj < 0.05, ]
DE_gene_DAC = DE_gene_DAC[, c("Symbol", "log2FoldChange", "padj")]
DE_gene_DAC = DE_gene_DAC[!is.na(DE_gene_DAC$Symbol), ]
DE_gene_DAC = DE_gene_DAC[order(-DE_gene_DAC$log2FoldChange), ]
print(nrow(DE_gene_DAC)) # 4015 genes
## [1] 3827
up_list = list(GSK = as.vector(DE_gene_GSK[DE_gene_GSK$log2FoldChange >= 1, ]$Symbol),
DAC = as.vector(DE_gene_DAC[DE_gene_DAC$log2FoldChange >= 1, ]$Symbol))
print(sapply(names(up_list), function(x) length(up_list[[x]])))
## GSK DAC
## 2477 3017
down_list = list(GSK = as.vector(DE_gene_GSK[DE_gene_GSK$log2FoldChange < 1, ]$Symbol),
DAC = as.vector(DE_gene_DAC[DE_gene_DAC$log2FoldChange < 1, ]$Symbol))
print(sapply(names(up_list), function(x) length(down_list[[x]])))
## GSK DAC
## 667 810
ggvenn(up_list,
fill_color = c("white", "white"),
stroke_size = 0.5,
set_name_size = 6) +
ggtitle("Up-regulated genes") +
theme(plot.title = element_text(size = 16, hjust = 0.5))
ggvenn(down_list,
fill_color = c("white", "white"),
stroke_size = 0.5,
set_name_size = 6) +
ggtitle("Down-regulated genes") +
theme(plot.title = element_text(size = 16, hjust = 0.5))
Use clusterProfiler package. This package runs fgesa algorithm by default.
The following results appeared to indicate changes involving molecular interactions and cellular structures. The similarity in “Biological process” invoked by the two drugs was prominent.
goPath = function(ont, gene) {
enrichGO(gene = gene,
OrgDb = org.Hs.eg.db,
keyType = "SYMBOL",
ont = ont,
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05)
}
# Molecular functions
goGSK_MF = goPath("MF", DE_gene_GSK$Symbol)
print(head(goGSK_MF$Description, 10))
## [1] "structural constituent of chromatin"
## [2] "protein heterodimerization activity"
## [3] "MHC class II protein complex binding"
## [4] "extracellular matrix structural constituent"
## [5] "cytoskeletal motor activity"
## [6] "cargo receptor activity"
## [7] "histone kinase activity"
## [8] "MHC protein complex binding"
## [9] "single-stranded DNA binding"
## [10] "single-stranded DNA helicase activity"
# Biological process
goGSK_BP = goPath("BP", DE_gene_GSK$Symbol)
print(head(goGSK_BP$Description, 10))
## [1] "nucleosome assembly"
## [2] "nucleosome organization"
## [3] "nuclear division"
## [4] "meiotic cell cycle process"
## [5] "protein-DNA complex assembly"
## [6] "meiotic nuclear division"
## [7] "organelle fission"
## [8] "meiotic cell cycle"
## [9] "protein localization to CENP-A containing chromatin"
## [10] "meiosis I cell cycle process"
# Molecular functions
goDAC_MF = goPath("MF", DE_gene_DAC$Symbol)
print(head(goDAC_MF$Description, 10))
## [1] "structural constituent of chromatin" "protein heterodimerization activity"
## [3] "sulfur compound binding" "cytoskeletal motor activity"
## [5] "integrin binding" "heparin binding"
## [7] "glycosaminoglycan binding" "amyloid-beta binding"
## [9] "peptide antigen binding" "gated channel activity"
# Biological process
goDAC_BP = goPath("BP", DE_gene_DAC$Symbol)
print(head(goDAC_BP$Description, 10))
## [1] "nucleosome assembly"
## [2] "nuclear division"
## [3] "nucleosome organization"
## [4] "organelle fission"
## [5] "meiotic cell cycle process"
## [6] "protein-DNA complex assembly"
## [7] "meiotic nuclear division"
## [8] "protein localization to CENP-A containing chromatin"
## [9] "meiotic cell cycle"
## [10] "regulation of nuclear division"
plot the “Biological Process” network. (Hard to customize the figure, so as is)
goplot(goGSK_BP, showCategory = 5)
goplot(goDAC_BP, showCategory = 5)
gesaPath = function(gene_set, DE_genes, nPermSimple = 10000) {
pathways = gmtPathways(gene_set)
ranks = DE_genes$log2FoldChange
names(ranks) = rownames(DE_genes)
gesa_result = fgsea(pathways = pathways,
stats = ranks,
minSize = 15,
maxSize = 500,
nPermSimple = nPermSimple)
return(gesa_result)
}
Curated (C2) pathways.
Result includes Many cancer- and methylation-related pathways.
c2_gene_set = "gmt-file/c2.all.v2024.1.Hs.entrez.gmt" # Path to gmt file
c2_GSK = gesaPath(c2_gene_set, DE_gene_GSK)
c2_top10_up = c2_GSK[ES > 0][head(order(pval), 10), pathway]
cat("Top ten up-regulated pathways:\n")
## Top ten up-regulated pathways:
print(c2_top10_up)
## [1] "YEGNASUBRAMANIAN_PROSTATE_CANCER"
## [2] "ACEVEDO_METHYLATED_IN_LIVER_CANCER_DN"
## [3] "ZHONG_RESPONSE_TO_AZACITIDINE_AND_TSA_UP"
## [4] "MIKKELSEN_MEF_HCP_WITH_H3_UNMETHYLATED"
## [5] "MUELLER_METHYLATED_IN_GLIOBLASTOMA"
## [6] "REACTOME_NEUROTRANSMITTER_RECEPTORS_AND_POSTSYNAPTIC_SIGNAL_TRANSMISSION"
## [7] "RICKMAN_HEAD_AND_NECK_CANCER_B"
## [8] "HAMAI_APOPTOSIS_VIA_TRAIL_DN"
## [9] "SENGUPTA_NASOPHARYNGEAL_CARCINOMA_WITH_LMP1_DN"
## [10] "MCGARVEY_SILENCED_BY_METHYLATION_IN_COLON_CANCER"
c2_top10_down = c2_GSK[ES < 0][head(order(pval), 10), pathway]
cat("Top ten down-regulated pathways:\n")
## Top ten down-regulated pathways:
print(c2_top10_down)
## [1] "PUJANA_XPRSS_INT_NETWORK" "REACTOME_S_PHASE"
## [3] "CROONQUIST_NRAS_SIGNALING_DN" "KAUFFMANN_DNA_REPAIR_GENES"
## [5] "WP_RETINOBLASTOMA_GENE_IN_CANCER" "MORI_LARGE_PRE_BII_LYMPHOCYTE_UP"
## [7] "ISHIDA_E2F_TARGETS" "KANG_DOXORUBICIN_RESISTANCE_UP"
## [9] "PUJANA_BRCA_CENTERED_NETWORK" "REACTOME_SYNTHESIS_OF_DNA"
rank_GSK = DE_gene_GSK$log2FoldChange
names(rank_GSK) = rownames(DE_gene_GSK)
c2_pathways = gmtPathways(c2_gene_set)
plotEnrichment(c2_pathways[["ZHONG_RESPONSE_TO_AZACITIDINE_AND_TSA_UP"]], rank_GSK) +
labs(title = "RESPONSE_TO_AZACITIDINE_AND_TSA_UP genes")
Hallmark pathways.
Metabolic alteration, DNA repair and immune responses.
h_gene_set = "gmt-file/h.all.v2024.1.Hs.entrez.gmt"
h_GSK = gesaPath(h_gene_set, DE_gene_GSK)
h_top10_up = h_GSK[ES > 0][head(order(pval), 10), pathway]
cat("Top ten up-regulated pathways:\n")
## Top ten up-regulated pathways:
print(h_top10_up)
## [1] "HALLMARK_KRAS_SIGNALING_DN" "HALLMARK_SPERMATOGENESIS"
## [3] "HALLMARK_MYOGENESIS" "HALLMARK_FATTY_ACID_METABOLISM"
## [5] "HALLMARK_HYPOXIA" "HALLMARK_KRAS_SIGNALING_UP"
## [7] "HALLMARK_ADIPOGENESIS" "HALLMARK_INTERFERON_ALPHA_RESPONSE"
## [9] "HALLMARK_XENOBIOTIC_METABOLISM" "HALLMARK_GLYCOLYSIS"
h_top10_down = h_GSK[ES < 0][head(order(pval), 10), pathway]
cat("Top ten down-regulated pathways:\n")
## Top ten down-regulated pathways:
print(h_top10_down)
## [1] "HALLMARK_MTORC1_SIGNALING" "HALLMARK_MYC_TARGETS_V1"
## [3] "HALLMARK_MITOTIC_SPINDLE" "HALLMARK_IL2_STAT5_SIGNALING"
## [5] "HALLMARK_CHOLESTEROL_HOMEOSTASIS" "HALLMARK_P53_PATHWAY"
## [7] "HALLMARK_INTERFERON_GAMMA_RESPONSE" "HALLMARK_COAGULATION"
## [9] "HALLMARK_DNA_REPAIR" "HALLMARK_APOPTOSIS"
Reactome pathways.
reactome_set = "gmt-file/c2.cp.reactome.v2024.1.Hs.entrez.gmt"
react_GSK = gesaPath(reactome_set, DE_gene_GSK)
react_top10_up = react_GSK[ES > 0][head(order(pval), 10), pathway]
cat("Top ten up-regulated pathways:\n")
## Top ten up-regulated pathways:
print(react_top10_up)
## [1] "REACTOME_NEUROTRANSMITTER_RECEPTORS_AND_POSTSYNAPTIC_SIGNAL_TRANSMISSION"
## [2] "REACTOME_ACTIVATION_OF_NMDA_RECEPTORS_AND_POSTSYNAPTIC_EVENTS"
## [3] "REACTOME_TRANSMISSION_ACROSS_CHEMICAL_SYNAPSES"
## [4] "REACTOME_POST_TRANSLATIONAL_MODIFICATION_SYNTHESIS_OF_GPI_ANCHORED_PROTEINS"
## [5] "REACTOME_POTASSIUM_CHANNELS"
## [6] "REACTOME_SENSORY_PERCEPTION"
## [7] "REACTOME_METABOLISM_OF_AMINO_ACIDS_AND_DERIVATIVES"
## [8] "REACTOME_G_ALPHA_S_SIGNALLING_EVENTS"
## [9] "REACTOME_ION_CHANNEL_TRANSPORT"
## [10] "REACTOME_NEURONAL_SYSTEM"
react_top10_down = react_GSK[ES < 0][head(order(pval), 10), pathway]
cat("Top ten down-regulated pathways:\n")
## Top ten down-regulated pathways:
print(react_top10_down)
## [1] "REACTOME_S_PHASE"
## [2] "REACTOME_SYNTHESIS_OF_DNA"
## [3] "REACTOME_MITOTIC_G1_PHASE_AND_G1_S_TRANSITION"
## [4] "REACTOME_RMTS_METHYLATE_HISTONE_ARGININES"
## [5] "REACTOME_G2_M_DNA_DAMAGE_CHECKPOINT"
## [6] "REACTOME_PROCESSING_OF_DNA_DOUBLE_STRAND_BREAK_ENDS"
## [7] "REACTOME_HDMS_DEMETHYLATE_HISTONES"
## [8] "REACTOME_BASE_EXCISION_REPAIR_AP_SITE_FORMATION"
## [9] "REACTOME_REPLACEMENT_OF_PROTAMINES_BY_NUCLEOSOMES_IN_THE_MALE_PRONUCLEUS"
## [10] "REACTOME_INHIBITION_OF_DNA_RECOMBINATION_AT_TELOMERE"
react_pathways = gmtPathways(reactome_set)
plotEnrichment(react_pathways[["REACTOME_DNA_METHYLATION"]], rank_GSK) +
labs(title = "REACTOME_DNA_METHYLATION genes")
Curated (C2) pathways.
Similar responses to GSK.
c2_gene_set = "gmt-file/c2.all.v2024.1.Hs.entrez.gmt" # Path to gmt file
c2_DAC = gesaPath(c2_gene_set, DE_gene_DAC)
c2_top10_up = c2_DAC[ES > 0][head(order(pval), 10), pathway]
cat("Top ten up-regulated pathways:\n")
## Top ten up-regulated pathways:
print(c2_top10_up)
## [1] "YEGNASUBRAMANIAN_PROSTATE_CANCER"
## [2] "ACEVEDO_METHYLATED_IN_LIVER_CANCER_DN"
## [3] "RICKMAN_HEAD_AND_NECK_CANCER_B"
## [4] "LIANG_SILENCED_BY_METHYLATION_2"
## [5] "MIKKELSEN_MEF_HCP_WITH_H3_UNMETHYLATED"
## [6] "ZHONG_RESPONSE_TO_AZACITIDINE_AND_TSA_UP"
## [7] "HELLER_SILENCED_BY_METHYLATION_UP"
## [8] "MCGARVEY_SILENCED_BY_METHYLATION_IN_COLON_CANCER"
## [9] "MUELLER_METHYLATED_IN_GLIOBLASTOMA"
## [10] "TAVAZOIE_METASTASIS"
c2_top10_down = c2_DAC[ES < 0][head(order(pval), 10), pathway]
cat("Top ten down-regulated pathways:\n")
## Top ten down-regulated pathways:
print(c2_top10_down)
## [1] "PUJANA_BRCA_CENTERED_NETWORK"
## [2] "ZHOU_CELL_CYCLE_GENES_IN_IR_RESPONSE_6HR"
## [3] "CROONQUIST_NRAS_SIGNALING_DN"
## [4] "MORI_IMMATURE_B_LYMPHOCYTE_DN"
## [5] "REN_BOUND_BY_E2F"
## [6] "WP_RETINOBLASTOMA_GENE_IN_CANCER"
## [7] "REACTOME_S_PHASE"
## [8] "KONG_E2F3_TARGETS"
## [9] "REACTOME_DEPOSITION_OF_NEW_CENPA_CONTAINING_NUCLEOSOMES_AT_THE_CENTROMERE"
## [10] "REACTOME_ASSEMBLY_OF_THE_ORC_COMPLEX_AT_THE_ORIGIN_OF_REPLICATION"
rank_DAC = DE_gene_DAC$log2FoldChange
names(rank_DAC) = rownames(DE_gene_DAC)
plotEnrichment(c2_pathways[["ZHONG_RESPONSE_TO_AZACITIDINE_AND_TSA_UP"]], rank_DAC) +
labs(title = "RESPONSE_TO_AZACITIDINE_AND_TSA_UP genes")
Hallmark pathways.
h_gene_set = "gmt-file/h.all.v2024.1.Hs.entrez.gmt"
h_DAC = gesaPath(h_gene_set, DE_gene_DAC)
h_top10_up = h_DAC[ES > 0][head(order(pval), 10), pathway]
cat("Top ten up-regulated pathways:\n")
## Top ten up-regulated pathways:
print(h_top10_up)
## [1] "HALLMARK_MYOGENESIS"
## [2] "HALLMARK_KRAS_SIGNALING_UP"
## [3] "HALLMARK_KRAS_SIGNALING_DN"
## [4] "HALLMARK_HYPOXIA"
## [5] "HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION"
## [6] "HALLMARK_XENOBIOTIC_METABOLISM"
## [7] "HALLMARK_INFLAMMATORY_RESPONSE"
## [8] "HALLMARK_HEME_METABOLISM"
## [9] "HALLMARK_COAGULATION"
## [10] "HALLMARK_COMPLEMENT"
h_top10_down = h_DAC[ES < 0][head(order(pval), 10), pathway]
cat("Top ten down-regulated pathways:\n")
## Top ten down-regulated pathways:
print(h_top10_down)
## [1] "HALLMARK_MYC_TARGETS_V1" "HALLMARK_CHOLESTEROL_HOMEOSTASIS"
## [3] "HALLMARK_MYC_TARGETS_V2" "HALLMARK_MITOTIC_SPINDLE"
## [5] "HALLMARK_DNA_REPAIR" "HALLMARK_ANDROGEN_RESPONSE"
## [7] "HALLMARK_FATTY_ACID_METABOLISM" "HALLMARK_P53_PATHWAY"
## [9] "HALLMARK_TNFA_SIGNALING_VIA_NFKB" "HALLMARK_APOPTOSIS"
Reactome pathways.
reactome_set = "gmt-file/c2.cp.reactome.v2024.1.Hs.entrez.gmt"
react_DAC = gesaPath(reactome_set, DE_gene_DAC)
react_top10_up = react_DAC[ES > 0][head(order(pval), 10), pathway]
cat("Top ten up-regulated pathways:\n")
## Top ten up-regulated pathways:
print(react_top10_up)
## [1] "REACTOME_NEUROTRANSMITTER_RECEPTORS_AND_POSTSYNAPTIC_SIGNAL_TRANSMISSION"
## [2] "REACTOME_STIMULI_SENSING_CHANNELS"
## [3] "REACTOME_ACTIVATION_OF_NMDA_RECEPTORS_AND_POSTSYNAPTIC_EVENTS"
## [4] "REACTOME_ION_CHANNEL_TRANSPORT"
## [5] "REACTOME_TRANSMISSION_ACROSS_CHEMICAL_SYNAPSES"
## [6] "REACTOME_DISEASES_ASSOCIATED_WITH_O_GLYCOSYLATION_OF_PROTEINS"
## [7] "REACTOME_PHASE_I_FUNCTIONALIZATION_OF_COMPOUNDS"
## [8] "REACTOME_O_GLYCOSYLATION_OF_TSR_DOMAIN_CONTAINING_PROTEINS"
## [9] "REACTOME_CARDIAC_CONDUCTION"
## [10] "REACTOME_PI3K_AKT_SIGNALING_IN_CANCER"
react_top10_down = react_DAC[ES < 0][head(order(pval), 10), pathway]
cat("Top ten down-regulated pathways:\n")
## Top ten down-regulated pathways:
print(react_top10_down)
## [1] "REACTOME_DEPOSITION_OF_NEW_CENPA_CONTAINING_NUCLEOSOMES_AT_THE_CENTROMERE"
## [2] "REACTOME_S_PHASE"
## [3] "REACTOME_ASSEMBLY_OF_THE_ORC_COMPLEX_AT_THE_ORIGIN_OF_REPLICATION"
## [4] "REACTOME_CHROMATIN_MODIFICATIONS_DURING_THE_MATERNAL_TO_ZYGOTIC_TRANSITION_MZT"
## [5] "REACTOME_MATERNAL_TO_ZYGOTIC_TRANSITION_MZT"
## [6] "REACTOME_DNA_METHYLATION"
## [7] "REACTOME_SENESCENCE_ASSOCIATED_SECRETORY_PHENOTYPE_SASP"
## [8] "REACTOME_SIRT1_NEGATIVELY_REGULATES_RRNA_EXPRESSION"
## [9] "REACTOME_NEGATIVE_EPIGENETIC_REGULATION_OF_RRNA_EXPRESSION"
## [10] "REACTOME_ERCC6_CSB_AND_EHMT2_G9A_POSITIVELY_REGULATE_RRNA_EXPRESSION"
plotEnrichment(react_pathways[["REACTOME_DNA_METHYLATION"]], rank_DAC) +
labs(title = "REACTOME_DNA_METHYLATION genes")
Interactive figures with a dynamic data table.
See Glimma vignette for instruction.
glimmaMA(dds, groups = dds$condition)
## 0 of 21700 genes were filtered out in DESeq2 tests
We demonstrated the process of obtaining NCBI RNA-seq raw count, selecting dataset and using it for differential expression and pathway analysis. The results were consistent with DNA demethylation effects as expected.
Love MI, et al. Analyzing RNA-seq data with DESeq2. https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
Akalin, A. Computational Genomics with R, Chapter 8, RNA-seq analysis. https://compgenomr.github.io/book/rnaseqanalysis.html
Gu Y, et al. Biomedical Knowledge Mining using GOSemSim and clusterProfiler. Part II: Enrichment analysis. https://yulab-smu.top/biomedical-knowledge-mining-book/enrichment-overview.html
sessionInfo()
## R version 4.3.2 (2023-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 11 x64 (build 26100)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] grid stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] plotly_4.10.4 ggvenn_0.1.10
## [3] Glimma_2.12.0 RColorBrewer_1.1-3
## [5] ggplot2_3.5.1 stringr_1.5.1
## [7] dplyr_1.1.4 pheatmap_1.0.12
## [9] hexbin_1.28.5 vsn_3.70.0
## [11] ReactomePA_1.46.0 msigdbr_7.5.1
## [13] fgsea_1.28.0 clusterProfiler_4.10.1
## [15] limma_3.58.1 EDASeq_2.36.0
## [17] ShortRead_1.60.0 GenomicAlignments_1.38.2
## [19] Rsamtools_2.18.0 Biostrings_2.70.3
## [21] XVector_0.42.0 BiocParallel_1.36.0
## [23] apeglm_1.24.0 DESeq2_1.42.1
## [25] SummarizedExperiment_1.32.0 MatrixGenerics_1.14.0
## [27] matrixStats_1.4.1 GenomicRanges_1.54.1
## [29] GenomeInfoDb_1.38.8 org.Hs.eg.db_3.18.0
## [31] AnnotationDbi_1.64.1 IRanges_2.36.0
## [33] S4Vectors_0.40.2 GEOquery_2.70.0
## [35] Biobase_2.62.0 BiocGenerics_0.48.1
##
## loaded via a namespace (and not attached):
## [1] splines_4.3.2 BiocIO_1.12.0 bitops_1.0-9
## [4] ggplotify_0.1.2 filelock_1.0.3 tibble_3.2.1
## [7] R.oo_1.27.0 polyclip_1.10-7 preprocessCore_1.64.0
## [10] graph_1.80.0 XML_3.99-0.17 lifecycle_1.0.4
## [13] edgeR_4.0.16 lattice_0.22-6 MASS_7.3-60
## [16] crosstalk_1.2.1 magrittr_2.0.3 sass_0.4.9
## [19] rmarkdown_2.29 jquerylib_0.1.4 yaml_2.3.10
## [22] cowplot_1.1.3 DBI_1.2.3 abind_1.4-8
## [25] zlibbioc_1.48.2 purrr_1.0.2 R.utils_2.12.3
## [28] ggraph_2.2.1 RCurl_1.98-1.16 yulab.utils_0.1.8
## [31] tweenr_2.0.3 rappdirs_0.3.3 GenomeInfoDbData_1.2.11
## [34] enrichplot_1.22.0 ggrepel_0.9.6 tidytree_0.4.6
## [37] reactome.db_1.86.2 codetools_0.2-20 DelayedArray_0.28.0
## [40] DOSE_3.28.2 xml2_1.3.6 ggforce_0.4.2
## [43] tidyselect_1.2.1 aplot_0.2.4 farver_2.1.2
## [46] viridis_0.6.5 BiocFileCache_2.10.2 jsonlite_1.8.9
## [49] tidygraph_1.3.1 bbmle_1.0.25.1 tools_4.3.2
## [52] progress_1.2.3 treeio_1.26.0 snow_0.4-4
## [55] Rcpp_1.0.13-1 glue_1.7.0 gridExtra_2.3
## [58] SparseArray_1.2.4 xfun_0.49 qvalue_2.34.0
## [61] withr_3.0.2 numDeriv_2016.8-1.1 BiocManager_1.30.25
## [64] fastmap_1.2.0 latticeExtra_0.6-30 digest_0.6.37
## [67] gridGraphics_0.5-1 R6_2.5.1 colorspace_2.1-0
## [70] GO.db_3.18.0 jpeg_0.1-10 biomaRt_2.58.2
## [73] RSQLite_2.3.9 R.methodsS3_1.8.2 tidyr_1.3.1
## [76] generics_0.1.3 data.table_1.16.4 rtracklayer_1.62.0
## [79] htmlwidgets_1.6.4 prettyunits_1.2.0 graphlayouts_1.2.1
## [82] httr_1.4.7 S4Arrays_1.2.1 scatterpie_0.2.4
## [85] graphite_1.48.0 pkgconfig_2.0.3 gtable_0.3.6
## [88] blob_1.2.4 hwriter_1.3.2.1 shadowtext_0.1.4
## [91] htmltools_0.5.8.1 scales_1.3.0 png_0.1-8
## [94] ggfun_0.1.8 knitr_1.49 rstudioapi_0.17.1
## [97] tzdb_0.4.0 reshape2_1.4.4 rjson_0.2.23
## [100] nlme_3.1-166 coda_0.19-4.1 curl_6.0.1
## [103] bdsmatrix_1.3-7 cachem_1.1.0 parallel_4.3.2
## [106] HDO.db_0.99.1 restfulr_0.0.15 pillar_1.10.0
## [109] vctrs_0.6.5 dbplyr_2.5.0 evaluate_1.0.1
## [112] readr_2.1.5 GenomicFeatures_1.54.4 mvtnorm_1.3-2
## [115] cli_3.6.2 locfit_1.5-9.10 compiler_4.3.2
## [118] rlang_1.1.3 crayon_1.5.3 labeling_0.4.3
## [121] interp_1.1-6 aroma.light_3.32.0 emdbook_1.3.13
## [124] affy_1.80.0 plyr_1.8.9 fs_1.6.5
## [127] stringi_1.8.4 viridisLite_0.4.2 deldir_2.0-4
## [130] babelgene_22.9 munsell_0.5.1 lazyeval_0.2.2
## [133] GOSemSim_2.28.1 Matrix_1.6-1.1 patchwork_1.3.0
## [136] hms_1.1.3 bit64_4.5.2 KEGGREST_1.42.0
## [139] statmod_1.5.0 igraph_2.1.2 memoise_2.0.1
## [142] affyio_1.72.0 bslib_0.8.0 ggtree_3.10.1
## [145] fastmatch_1.1-6 bit_4.5.0.1 gson_0.1.0
## [148] ape_5.8-1